Python API r1.8 Guide Summaries/Training.md

   1 Training
   2 ----ORIG----
   3 tf.train provides a set of classes and functions that help train models.
   4
   5 Optimizers
   6 The Optimizer base class provides methods to compute gradients for a loss and apply gradients to variables. A collection of subclasses implement classic optimization algorithms such as GradientDescent and Adagrad.
   7
   8 You never instantiate the Optimizer class itself, but instead instantiate one of the subclasses.
   9 ALL CLASSES
  10 tf.train.Optimizer // CLASS for API and Ops to train model. Used with subclasses.
  11         # Create an optimizer with the desired parameters.
  12         opt = GradientDescentOptimizer(learning_rate=0.1)
  13 tf.train.GradientDescentOptimizer // implements the gradient descent algorithm.
  14 tf.train.AdadeltaOptimizer // Adadelta algorithm.
  15 tf.train.AdagradOptimizer //  Adagrad algorithm.
  16 tf.train.AdagradDAOptimizer // Takes care of regularization in a minibatch from AdagradDA. Used where there is a need for large sparsity. Garuntees sparsity for linear models.
  17 tf.train.MomentumOptimizer // Momuntum algorithm
  18 tf.train.AdamOptimizer //https://arxiv.org/abs/1412.6980
  19 tf.train.FtrlOptimizer // FTRL algorthm https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf
  20 tf.train.ProximalGradientDescentOptimizer // Proximal gradient descent algorithmhttp://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf
  21 tf.train.ProximalAdagradOptimizer // Proximal Adagrad algorithm.
  22 tf.train.RMSPropOptimizer // http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
  23 See tf.contrib.opt for more optimizers.
  24
  25 Gradient Computation
  26 TensorFlow provides functions to compute the derivatives for a given TensorFlow computation graph, adding operations to the graph. The optimizer classes automatically compute derivatives on your graph, but creators of new Optimizers or expert users can call the lower-level functions below.
  27
  28 tf.gradients(ys,xs,grad_ys=None,name='gradients',colocate_gradients_with_ops=False,gate_gradients=False,aggregation_method=None,
  29 stop_gradients=None ) // Gets derivatives between ys and xs. Returns A list of sum(dy/dx) for each x in xs.
  30 tf.AggregationMethod // Computing partial derivatives to require aggregating gradient contributions.
  31 tf.stop_gradient // Useful to compute a value with TF but need to pretend it were constant. EM algorithm, contrastive divergence trainnig of Botlzmann machines. Adverserial training with no backprop.
  32 tf.hessians // Hessians adds to the graph to output the Hessian matrix of ys of xs.
  33 Gradient Clipping
  34 TensorFlow provides several operations that you can use to add clipping functions to your graph. You can use these functions to perform general data clipping, but they're particularly useful for handling exploding or vanishing gradients.
  35
  36 tf.clip_by_value  // Clips tensor values to a specified min and max.
  37 tf.clip_by_norm // Clips tensor values to a maximum L2-norm.
  38 tf.clip_by_average_norm // Clips tensor values to a maximum average L2-norm.
  39 tf.clip_by_global_norm // Clips values of multiple tensors by the ratio of the sum of their norms.
  40 tf.global_norm // global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))
  41 Decaying the learning rate
  42 tf.train.exponential_decay //  decayed_learning_rate = learning_rate *  decay_rate ^ (global_step / decay_steps)
  43 tf.train.inverse_time_decay // decayed_learning_rate = learning_rate / (1 + decay_rate * global_step /decay_step)
  44 tf.train.natural_exp_decay // etc.
  45 tf.train.piecewise_constant // etc.
  46 tf.train.polynomial_decay // etc.
  47 tf.train.cosine_decay // etc.
  48 tf.train.linear_cosine_decay // etc.
  49 tf.train.noisy_linear_cosine_decay // etc.
  50 Moving Averages
  51 Some training algorithms, such as GradientDescent and Momentum often benefit from maintaining a moving average of variables during optimization. Using the moving averages for evaluations often improve results significantly.
  52
  53 tf.train.ExponentialMovingAverage
  54 Coordinator and QueueRunner
  55 See Threading and Queues for how to use threads and queues. For documentation on the Queue API, see Queues.
  56
  57 tf.train.Coordinator // See docs
  58 tf.train.QueueRunner // See docs
  59 tf.train.LooperThread // See docs
  60 tf.train.add_queue_runner // See docs
  61 tf.train.start_queue_runners // See docs
  62 Distributed execution
  63 See Distributed TensorFlow for more information about how to configure a distributed TensorFlow program.
  64
  65 tf.train.Server // See docs
  66 tf.train.Supervisor // See docs
  67 tf.train.SessionManager // See docs
  68 tf.train.ClusterSpec // See docs
  69 tf.train.replica_device_setter // See docs
  70 tf.train.MonitoredTrainingSession // See docs
  71 tf.train.MonitoredSession // See docs
  72 tf.train.SingularMonitoredSession // See docs
  73 tf.train.Scaffold // See docs
  74 tf.train.SessionCreator // See docs
  75 tf.train.ChiefSessionCreator // See docs
  76 tf.train.WorkerSessionCreator // See docs
  77 Reading Summaries from Event Files // See docs
  78 See Summaries and TensorBoard for an overview of summaries, event files, and visualization in TensorBoard.
  79
  80 tf.train.summary_iterator
  81 Training Hooks
  82 Hooks are tools that run in the process of training/evaluation of the model.
  83
  84 tf.train.SessionRunHook // See docs
  85 tf.train.SessionRunArgs // See docs
  86 tf.train.SessionRunContext // See docs
  87 tf.train.SessionRunValues // See docs
  88 tf.train.LoggingTensorHook // See docs
  89 tf.train.StopAtStepHook // See docs
  90 tf.train.CheckpointSaverHook // See docs
  91 tf.train.NewCheckpointReader // See docs
  92 tf.train.StepCounterHook // See docs
  93 tf.train.NanLossDuringTrainingError // See docs
  94 tf.train.NanTensorHook // See docs
  95 tf.train.SummarySaverHook // See docs
  96 tf.train.GlobalStepWaiterHook // See docs
  97 tf.train.FinalOpsHook // See docs
  98 tf.train.FeedFnHook // See docs
  99 Training Utilities // See docs
 100 tf.train.global_step // See docs
 101 tf.train.basic_train_loop // See docs
 102 tf.train.get_global_step // See docs
 103 tf.train.assert_global_step // See docs
 104 tf.train.write_graph // See docs
 105
 106 ----CONTRIB----
 107
 108 Splitting sequence inputs into minibatches with state saving
 109 Use tf.contrib.training.SequenceQueueingStateSaver or its wrapper tf.contrib.training.batch_sequences_with_states if you have input data with a dynamic primary time / frame count axis which you'd like to convert into fixed size segments during minibatching, and would like to store state in the forward direction across segments of an example. //
 110 * tf.contrib.training.batch_sequences_with_states(input_key,input_sequences,input_context,input_length,initial_states,num_unroll,batch_size,num_threads=3,capacity=1000,allow_small_batch=True,pad=True,make_keys_unique=False,make_keys_unique_seed=None,name=None) //
 111         // Creates batches from segments of sequential input
 112 * tf.contrib.training.NextQueuedSequenceBatch //  CLASS stores a deffered sequenceQueuingStateSaver's data
 113 * tf.contrib.training.SequenceQueueingStateSaver // CLASS is used instead of a queue to split variable length sequences into segments of sequences with fixed length. Batches into mini-batches
 114 Online data resampling
 115 To resample data with replacement on a per-example basis, use tf.contrib.training.rejection_sample or tf.contrib.training.resample_at_rate.
 116 For rejection_sample, provide a boolean Tensor describing whether to accept or reject. Resulting batch sizes are always the same.
 117 For resample_at_rate, provide the desired rate for each example. Resulting batch sizes may vary.
 118 If you wish to specify relative rates, rather than absolute ones, use tf.contrib.training.weighted_resample (which also returns the actual resampling rate used for each output example). //
 119
 120 Use tf.contrib.training.stratified_sample to resample without replacement from the data to achieve a desired mix of class proportions that the Tensorflow graph sees. For instance, if you have a binary classification dataset that is 99.9% class 1, a common approach is to resample from the data so that the data is more balanced.
 121 * tf.contrib.training.rejection_sample(tensors,accept_prob_fn,batch_size,queue_threads=1,enqueue_many=False,prebatch_capacity=16,prebatch_threads=1,runtime_checks=False,name=None)
 122         // Creates batches by rejecting samples not accepted by a function
 123 * tf.contrib.training.resample_at_rate(inputs,rates,scope=None,seed=None,back_prop=False)
 124         //
 125 * tf.contrib.training.stratified_sample(tensors,labels,target_probs,batch_size,init_probs=None,enqueue_many=False,queue_capacity=16,threads_per_queue=1,name=None)
 126         // Resamples inputs at a rate returning a new resampled set
 127 * tf.contrib.training.weighted_resample(inputs,weights,overall_rate,scope=None,mean_decay=0.999,seed=None)
 128         // Creates batches based on probabilities
 129 Bucketing
 130 Use tf.contrib.training.bucket or tf.contrib.training.bucket_by_sequence_length to stratify minibatches into groups ("buckets").
 131 Use bucket_by_sequence_length with the argument dynamic_pad=True to receive minibatches of similarly sized sequences for efficient training via dynamic_rnn.
 132 * tf.contrib.training.bucket(tensors,which_bucket,batch_size,num_buckets,num_threads=1,capacity=32,bucket_capacities=None,shapes=None,dynamic_pad=False,allow_smaller_final_batch=False,keep_input=True,shared_name=None,name=None) //
 133         // An aproximate weighted resampling of inputs. Choses inputs where rate of selection is proportional to weights.
 134 * tf.contrib.training.bucket_by_sequence_length(input_length,tensors,batch_size,bucket_boundaries,num_threads=1,capacity=32,bucket_capacities=None,shapes=None,dynamic_pad=False,allow_smaller_final_batch=False,keep_input=True,shared_name=None,name=None) //
 135         // Lazy bucketing of inputs according to their length. Calls tf.contrib.training.bucket and after subdividing bucket boundries identifies what bucks an input_length belongs to and uses that.